- My origins in programming, data science, and open science
- Improving reproducibility, collaboration and communication in environmental science with open science tools
- Resources and recommendations
October 21, 2016, Hopkins Marine Station, Stanford University
Reproducibility is foundatational to science, but we rarely test it, even with our own work.
Fig of headlines (does this slide fit here)
Data Science:
"an exciting discipline that allows you to turn raw data into understanding, insight, and knowledge" (Grolemund & Wickham 2016)
Data Science:
"an exciting discipline that allows you to turn raw data into understanding, insight, and knowledge" (Grolemund & Wickham 2016)
Data Science:
"an exciting discipline that allows you to turn raw data into understanding, insight, and knowledge" (Grolemund & Wickham 2016)
Open Science:
"the concept of transparency at all stages of the research process, coupled with free and open access to data, code, and papers" (Hampton et al. 2014)
I primarily use these tools:
TODO: images
This talk: jules32.github.io/opensci-talk

Science:
Data science:
* Thankfully, I had wonderful programming mentors:
Steve Haddock, Dave Foley, Ashley Booth
TODO: image
2012: OHI method and first global assessment (Halpern et al. 2012)
2013: second annual global assessment
2013: second annual global assessment
We expected to easily reproduce our previous work. We had planned ahead:
We struggled to reproduce our work using standard approaches to reproducibility and collaboration
Lowndes et al. in prep: Improving reproducibility, collaboration, and communication in environmental science using open science tools
underscore the importance of:
"Data scientists, according to interviews and expert estimates, spend from 50 percent to 80 percent of their time mired in the mundane labor of collecting and preparing data, before it can be explored for useful information." - NYTimes (2014)
"Data scientists, according to interviews and expert estimates, spend from 50 percent to 80 percent of their time mired in the mundane labor of collecting and preparing data, before it can be explored for useful information." - NYTimes (2014)
Ultimate goal: Tidy data (Wickham 2014)
Before
After
Data wrangling with the grammar of data manipulation and tidy data
Grolemund & Wickham 2016: R for Data Science
TODO: version control quote
Before
final.csv and final_JL-2016-08-05.csv)After
gitTODO: Collaboration quote
Before
After
Before
After
These tools and this workflow make our work possible.
All on ohi-science.org
1 - Learn to code
   - in R
   - with RStudio
2 - Use version control
   - git
   - with GitHub
   - through RStudio
Introduce these concepts incrementally: evolution not revolution
Books, trainings, and webinars that helped me:
Recent academic publications:
THANK YOU
email: lowndes @nceas.ucsb.edu
twitter: @juliesquid
talk url: https://jules32.github.io/opensci-talk
15-minute version of this talk at WSN: Friday, Nov 11, Session 7, 3pm
NCEAS is hiring: www.nceas.ucsb.edu/positionsopen
–> TODO: quote from Woo et al
reproducibility
R functions and packagescollaboration
communication